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IN THE CLAIMS 

Please amend the claims as follows: 

1 . (Original) A method of executing a plurality of threads within a single programmable 
processor, the method comprising: 

receiving an instruction stream for each one of the plurality of threads at an execution 

unit; 

executing instructions from each instruction stream received at the execution unit in a 
multistage pipeline within the execution unit such that, at any given time, the multistage pipeline 
includes instructions from different ones of the instruction streams in different stages of the 
multistage pipeline, the instructions including a single instruction that operates on a plurality of 
data elements in partitioned fields of at least one register to produce a catenated result, the at 
least one register having a register width and each of the data elements having an elemental 
width smaller than the register width. 

2. (Original) The method of claim 1 wherein the number of threads executing within the 
execution unit is prime relative to a rate of execution of a slowest functional unit in the execution 
unit. 

3. (Original) The method of claim 1 wherein the instructions from the plurality of 
instruction streams are executed in a round-robin manner. 

4. (Original) The method of claim 1 wherein only one thread from the plurality of 
threads can handle an exception at any given time. 

5. (Original) The method of claim 1 further comprising: 
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decoding a second single instruction specifying a third and a fourth register each 
containing a plurality of floating-point operands; 

multiplying the plurality of floating point operands in the third register by the plurality of 
operands in the fourth register to produce a plurality or products; and 

providing the plurality of products to partitioned fields of a result register as a catenated 

result. 

6. (Currently Amended) A computer-readable storage medium: 

having an instruction stream for each one of a plurality of threads that instruct a computer 
system to perform operations comprising, 

receiving an instruction stream for each one of the plurality of threads at an execution 

unit; 

executing instructions from each instruction stream received at the execution unit in a 
multistage pipeline within the execution unit such that, at any given time, the multistage pipeline 
includes instructions from different ones of the instruction streams in different stages of the 
multistage pipeline, the instructions including a single instruction that operates on a plurality of 
data elements in partitioned fields of at least one register to produce a catenated result, the at 
least one register having a register width and each of the data elements having an elemental 
width smaller than the register width. 
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7. (Currently Amendedy The computer-readable storage medium of claim 6 wherein the 
number of threads executing within the execution unit is prime relative to a rate of execution of a 
slowest functional unit in the execution unit. 

8. (Currently Amended) The computer-readable storage medium of claim 6 wherein the 
instructions from the plurality of instruction streams are executed in a round-robin manner. 

9. (Currently Amended) The computer-readable storage medium of claim 6 wherein 
only one thread from the plurality of threads can handle an exception at any given time. 

10-14. (Canceled) 
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